scalable bayesian logistic regression
Coresets for Scalable Bayesian Logistic Regression
The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification.
Reviews: Coresets for Scalable Bayesian Logistic Regression
Note that in minibatch inference methods, at each iteration, a small subset of the data is sampled from the full dataset and used to make an update; these methods take advantage of the redundancy in data to perform inexpensive updates. In this paper, coresets reduce the total dataset size by, in some sense, approximating the dataset with a smaller group of (weighted) examples. However, when coresets are used in existing inference algorithms (such as these minibatch algorithms), it seems to me that a very similar procedure will occur: a small subset of this approximate, weighted dataset will be drawn, and used to make an update. I am not convinced this would actually speed up inference (i.e. In a sense, I feel that the main thing happening here is that the data is approximated in a smaller/compressed fashion; I can see how this might help with data storage concerns, but I don't see a great justification for why it would appreciably speed inference over existing minibatch methods (especially considering a coreset must be constructed before inference can proceed, which adds additional inference time to this method). One way to demonstrate this would be with timing comparison plots that explicitly show that coresets yield faster inferences given large datasets when compared to minibatch methods---however, no direct experiments of this sort are given.
- Research Report > New Finding (0.52)
- Research Report > Experimental Study (0.41)
Coresets for Scalable Bayesian Logistic Regression
Huggins, Jonathan, Campbell, Trevor, Broderick, Tamara
The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification.
- Research Report > New Finding (0.44)
- Research Report > Experimental Study (0.44)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.61)